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Abstract 

An oblivious subspace embedding ( OSE) given some parameters e, d is a distribution 2? over 
matrices 11 G j^^x" such that for any hnear subspace W C R" with dim(Vl^) = it holds that 

Pn~c(Va; e W \\Wx\\2 G (1 ± e)||:E||2) > 2/3. 

We show an OSE exists with m = 0{d^ /e^) and where every 11 in the support of T) has exactly 
s = 1 non-zero entries per column. This improves previously best known bound in [Clarkson- 
WoodrufF, arXiv abs/1207.6365]. Our quadratic dependence on d is optimal for any OSE with 
s = 1 [Nelson-Nguyen, 2012]. We also give two OSE's, which we call Oblivious Sparse Norm- 
Approximating Projections (OSNAPs), that both allow the parameter settings m ~ 0{d/e'^) 
and 3 = polylog(d)/e, or m = 0{d^^'^ /e'^) and s = 0(l/e) for any constant 7 > 00 This 
m is nearly optimal since to > c? is required simply to no non-zero vector of W lands in the 
kernel of H. These arc the first constructions with m — o{d^) to have s = o(d). In fact, our 
OSNAPs are nothing more than the sparse Johnson-Lindcnstrauss matrices of [Kane-Nelson, 
SODA 2012]. Our analyses all yield OSE's that are sampled using either 0(l)-wise or Oilogd)- 
wise independent hash functions, which provides some efficiency advantages over previous work 
for turnstile streaming applications. Our main result is essentially a Bai-Yin type theorem in 
random matrix theory and is likely to be of independent interest: i.e. we show that for any 
U S W^^'^ with orthonormal columns and random sparse 11, all singular values of lUJ lie in 
[1 — £, 1 -I- e] with good probability. 

Plugging OSNAPs into known algorithms for numerical linear algebra problems such as ap- 
proximate least squares regression, low rank approximation, and approximating leverage scores 
implies faster algorithms for all these problems. For example, for the approximate least squares 
regression problem of computing x that minimizes \\Ax — fe||2 up to a constant factor, our cm- 
beddings imply a running time of 0(nnz(A) + which is essentially the best bound one could 
hope for (up to logarithmic factors). Here r ~ rank(^), nnz(-) counts non-zero entries, and uj is 
the exponent of matrix multiplication. Previous algorithms had a worse dependence on r. 
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^We say g = when g = n(//polylog(/)), g = 0(/) when g = 0(/ ■polylog(/)), and g = e(/) when g = «(/) 
and g = 0(/) simultaneously. 
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1 Introduction 



There has been much recent work on applications of dimensionahty reduction to handhng large 
datasets. Typically special features of the data such as low "intrinsic" dimensionality, or sparsity, 
are exploited to reduce the volume of data before processing, thus speeding up analysis time. One 
success story of this approach is the applications of fast algorithms for the Johnson-Lindenstrauss 
lemma |JL84j . which allows one to reduce the dimension of a set of vectors while preserving all 
pairwise distances. There have been two popular lines of work in this area: one focusing on fast 
embeddings for all vectors |A( ]n9[[mMKmi[HVTTllKMR12[IKWTTl|Vybl 1 1 , and one focusing on 



fast embeddings specifically for sparse vectors |Ach03llBOR10llDKS10l[KN10llKN12j . 

In this work we focus on the problem of constructing an oblivious subspace embedding ( OSE) 
|Sar06j and on applications of these embeddings. Roughly speaking, the problem is to design a 
data-independent distribution over linear mappings such that when data come from an unknown 
low-dimensional subspace, they are reduced to roughly their true dimension while their structure 
(all distances in the subspace in this case) is preserved at the same time. It can be seen as 
a continuation of the approach based on the Johnson-Lindenstrauss lemma to subspaces. Here 
we focus on the setting of sparse inputs, where it is important that the algorithms take time 
proportional to the input sparsity. These embeddings have found applications in numerical linear 
algebra problems such as least squares regression, low rank approximation, and approximating 
leverage scores |( ]Wn9l[CTWT2|lDMIMW12llNDTn9l[SiHl6llTrol Ij . We refer the interested reader to 
the surveys |HMTll|IMahll) for an overview of this area. 

Throughout this document we use || • || to denote I2 norm in the case of vector arguments, and 
£2-5.2 operator norm in the case of matrix arguments. Recall the definition of the OSE problem. 

Definition 1. The oblivious subspace embedding problem is to design a distribution over m x n 
matrices H such that for any d-dimensional subspace W C M", with probability at least 2/3 over 
the choice ofU^V, the following inequalities hold for all x £ W simultaneously: 

(1 - e)\\x\\ < \\Ux\\ < {l + e)\\x\\. 

Here n, d, e, 5 are given parameters of the problem and we would like m as small as possible. 

OSE's were first introduced in [Sar06j as a means to obtain fast randomized algorithms for 
several numerical linear algebra problems. To see the connection, consider for example the least 
squares regression problem of computing argmin^gjgd \\Ax — b\\ for some A G R"^*^. Suppose 11 G 
j^mxn pj-gggrves the £2 norm up to 1 + e of all vectors in the subspace spanned by b and the columns 
of A. Then computing argmin^ ||n^x — n5|| instead gives a solution that is within 1 + e of optimal. 
Since the subspace being preserved has dimension at most r + 1 < d + 1, where r = rank(A), 
one only needs m = f{r + for whatever function / is achievable in some OSE construction. 
Thus the running time for approximate n x d regression becomes that for f{r,£) x d regression, 
plus an additive term for the time required to compute 11^, 116. Even if A has full column rank 
and r = d this is still a gain for instances with n ^ d. Also note that the 2/3 success probability 
guaranteed by Definition [1] can be amplified to 1 — (5 by running this procedure 0(log(l/(5)) times 
with independent randomness and taking the best x found in any run. 

Naively there is no gain from the above approach since the time to compute 11^ could be as 
large as matrix multiplication between an m x n and n x d matrix. Since m > d in any OSE, this 
is 0{nd^~^) time where uj < 2.373 . . . |Will2j is the exponent of square matrix multiplication, and 
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exact least squares regression can already be computed in this time bound. The work of |Sar06j 
overcame this barrier by choosing 11 to be a special structured matrix, with the property that HA 
can be computed in time 0{ndlogn) (see also |Trollj ). This matrix 11 was the Fast Johnson- 
Lindenstrauss Transform of |AC09j . which has the property that IIx can be computed in roughly 
0(n log n) time for any x £ M". Thus, multiplying UA by iterating over columns of A gives the 
desired speedup. 

The 0{ndlogn) running time of the above scheme to compute 11^ seems almost linear, and 
thus nearly optimal, since the input size is already nd to describe A. While this is true for dense A, 
in many practical instances one often expects the input matrix A to be sparse, in which case linear 
time in the input description actually means 0(nnz(^)), where nnz(-) is the number of non-zero 
entries. For example consider the case of A being the Netflix matrix, where Aij is user i's score for 
movie j: A is very sparse since most users do not watch, let alone score, most movies |ZWSP08] . 

In a recent beautiful and surprising work, |CW12j showed that there exist OSE's with m = 
poly((i/e), and where every matrix IT in the support of the distribution is very sparse: even with 
only s = 1 non-zero entries per column! Thus one can transform, for example, an n x d least 
squares regression problem into a poly(d/e) x d regression problem in nnz(^) time. They gave two 
sparse OSE constructions: one with m = 0{d'^ /e^),s = 1, and another with ni = 0{d'^ /e^), s = 
0((logd)/e)ll The second construction is advantageous when d is larger as a function of n and one 
is willing to slightly worsen the nnz(A) term in the running time for a gain in the input size of the 
final regression problem. 

We also remark that the analyses given of both constructions in |CW12j require r2((i)-wise 
independent hash functions, so that from the 0{d)-wise independent seed used to generate 11 
naively one needs an additive i}{d) time to identify the non-zero entries in each column just to 
evaluate the hash function. In streaming applications this can be improved to additive 0(log^ d) 
time using fast multipoint evaluation of polynomials (see |KNPWlT| Remark 16]), though ideally 
if s = 1 one could hope for a construction that allows one to find, for any column, the non-zero 
entry in that column in constant time given only a short seed that specifies 11 (i.e. without writing 
down n explicitly in memory, which could be prohibitively expensive for n large in applications 
such as streaming and out-of-core numerical linear algebra). Recall that in the entry- wise turnstile 
streaming model, A receives entry-wise updates of the form ((i,j),v), which cause the change 
Aij ^ Aij + V. Updating the embedding thus amounts to adding v times the jth row of 11 to UA, 
which should ideally take 0{s) time and not 0{s) + 0(log^ d). 

In the following paragraph we let be the space required to store 11 implicitly (e.g. store 
the seed to some hash function that specifies 11). We let tc be the running time required by an 
algorithm which, given a column index and the length-Sn seed specifying 11, returns the list of all 
non-zeroes in that column in 11. 

Our Main Contribution: We give an improved analysis of the ,3 = 1 OSE in |CW12j and show 
that it actually achieves m = 0{d'^ /e'^), s = 1. Our analysis is near-optimal since m = Q{d'^) is re- 
quired for any OSE with s = 1 |NN12j . Furthermore, for this construction we show tc = 0{1), Su = 
0{log{nd)). We also show that the two sparse Johnson-Lindenstrauss constructions of |KN12j both 

^Recently after sharing the statement of our bounds with the authors of |CW12] . independently of our methods 
they have been able to push their own methods further to obtain m — 0{{d^ /e^)log^{d/e)) with s = 1, nearly 
matching our bound, though only for the s = 1 case. This improves the two bounds in the topmost row of Figure [T] 
under the [CW12] reference to come within polylog d or polylog k factors of the two bounds in our topmost row. 
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reference 


regression 


leverage scores 


low rank approximation 


|CW12j 


0{nnz{A)) + 0{d^) 
0(nnz(A)logn) + O(r^) 


0(nnz(yl) + r^) 


0(imz(A)) + 0{nk^) 
0(nnz(A)logfc) + 0(nfc2) 


this work 


0(nnz(A) + d^logd) 
0(nnz(A) + r") 


0(nnz(yl) + r") 


O {nnz{ A)) + 0{nk^) 

0(nnz(A) log°(i' fc) + 0(nfc'"-i) 

0(nnz(A)) + C)(nfc"-i+'^) 



Figure 1: The improvement gained in running times by using our OSE's. Dependence on e sup- 
pressed for readability; see Section [3] for dependence. 

yield OSE's that allow for the parameter settings m = 0{d/e'^),s = poly log (d) /e, tc = 0(s),5n = 
0(logdlog(nd)) or m = 0{(f+^ /e^),s = 0^{l/e),tc = 0((logd)/e), 5n = 0(log dlog(nd)) for any 
desired constant 7 > 0. This m is nearly optimal since m > d is required simply to ensure that no 
non-zero vector in the subspace lands in the kernel of IT. Plugging our improved OSE's into previous 
work implies faster algorithms for several numerical linear algebra problems, such as approximate 
least squares regression, low rank approximation, and approximating leverage scores. We remark 
that both of the OSE's in this work and |CW12j with s ^ 1 have the added benefit of preserving 
any subspace with 1/ poly((i), and not just constant, failure probability. 

1.1 Problem Statements and Bounds 

We now formally define all numerical linear algebra problems we consider. Plugging our new 
OSE's into previous algorithms for the above problems yields the bounds in Figure [Tl the value r 
used in bounds denotes rank(A). 

Approximating Leverage Scores: A d-dimensional subspace W C M" can be written as = 
{x : 3y G W^,x = Uy} for some U G W^^'^ with orthonormal columns. The squared Euclidean 
norms of rows of U are unique up to permutation, i.e. they depend only on A, and are known as 
the leverage scores of A. Given A, we would like to output a list of its leverage scores up to 1 it e. 

Least Squares Regression: Given A G M"^'^, b G M", compute a; G M*^ so that \\Ax — b\\ < 
(1 + e) • min^g^d \\Ax — b\\. 

Low Rank Approximation: Given A G M"'^'^ and integer A; > 0, compute A^ G M"^"^ with 
rank(^) < so that ||y4 — A^H^ < (1 + e) • minrank(ylfc)<A: ll^~^fc||F) where || • \\f is Frobenius norm. 

1.2 Our Construction and Techniques 

The s = 1 construction is simply the TZ sketch |TZ12j . This matrix 11 is specified by a random 
hash function /i : [d] — t- [n] and a random a G { — 1, 1}"'. For each i G [d] we set Ilh{i),i = ^i) ^^'^ 
every other entry in 11 is set to zero. Observe any d-dimensional subspace W C can be written 
as W = {x : 3y G M"^, X = Uy} for some U G M"^*^ with orthonormal columns. The analysis of the 
s = 1 construction in |CW12] worked roughly as follows: let I C [n] denote the set of "heavy" rows, 
i.e. those rows Uj of U where ||nj|| is "large". We write x = xx + x^n]\Xi where xs for a set S denotes 
X with all coordinates in [n]\S zeroed out. Then ||x|p = + ||a;[n]\xlP + 2(a;i, The 

argument in |CW12| conditioned on I being perfectly hashed by h so that H^xlP is preserved exactly. 
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Using an approach in |KN101lKN12j based on the Hanson- Wright inequahty |HW71j together with 
a net argument, it was argued that ||x[„]\x|p is preserved simultaneously for all x ^ W; this step 
required 0((i)-wise independence to union bound over the net. A simpler concentration argument 
was used to handle the {xx,X[n]\x) term. The construction in |CW12j with smaller m and larger s 
followed a similar but more complicated analysis; that construction involving hashing into buckets 
and using the sparse Johnson-Lindenstrauss matrices of |KN12j in each bucket. 

Our analysis is completely different. First, just as in the TZ sketch's application to £2 estimation 
in data streams, we only require h to be pairwise independent and a to be 4-wise independent. 
Our observation is simple: a matrix IT preserving the Euclidean norm of all vectors x E up to 
1 lb e is equivalent to the statement ||nC/y|| = (1 it e)||y|| simultaneously for all y G M*^. This is 
equivalent to all singular values of HU lying in the interval [1 — e, 1 + e]o Write S = (IIU)*IIU , so 
that we want to show all eigenvalues values of S lie in [(1 — e)^, (1 + e)^]. We can trivially write 
S = I + {S — I), and thus by Weyl's inequality (see a statement in Section [2]) all eigenvalues of S 
are 1 it US — /||. We thus show that US — / || is small with good probability. By Markov's inequality 

Pdis - /|| >t) = F{\\s - If > t^) < ■ K\\s - If < ■ K\\s - ifp. 

Bounding this latter quantity is a simple calculation and fits in under a page (Theorem [3]) . 

The two constructions with smaller m ~ d/e^ are the sparse Johnson-Lindenstrauss matrices 
of |KN12j . In particular, the only properties we need from our OSE in our analyses are the following. 
Let each matrix in the support of the OSE have entries in {0, ~^/V^}- For a randomly drawn 

n, let 6ij be an indicator random variable for the event Uij 7^ 0, and write Uij = bi^jUi^jj 
where the cJij are random signs. Then the properties we need are 

• For any j £ [j^]) ^Y^=\ — ^ with probability 1. 

• For any S <Z [m] x ^U{i,j)eskj ^ {s/m)\^\. 

The second property says the 6ij are negatively correlated. We call any matrix drawn from an 
OSE with the above properties an oblivious sparse norm- approximating projection (OSNAP). 

The work of |KN12j gave two OSNAP distributions, either of which suffice for our current 
OSE problem. In the first construction, each column is chosen to have exactly s non-zero entries 
in random locations, each equal to zizl/^/s uniformly at random. For our purposes the signs 
(7ij need only be 0(logd)-wise independent, and each column can be specified by a 0(log (i)-wise 
independent permutation, and the seeds specifying the permutations in different columns need only 
be 0(logd)-wise independent. In the second construction we pick hash functions h : [d] x [s] — t- 
[m/s], a : [d] X [s] — )• { — 1,1}, both 0(logd)-wise independent, and thus each representable using 
0(log dlog nd) random bits. For each {i,j) G [d] x [s] we set Il(^j_i-^g^^ij-^^i = a{i,j)/^/s, and all 
other entries in 11 are set to zero. Note also that the TZ sketch is itself an OSNAP with s = 1. 

Just as in the TZ sketch, it suffices to show some tail bound: that P(||«S — /|| > e') is small for 
some e' = 0{e), where S = {IIU)*IIU . Note that if the eigenvalues of 5 — / are Ai, . . . , Xd, then the 
eigenvalues of (5 — lY are X\, . . . , A^. Thus for i even, tr((5 — lY) = Yli=i is an upper bound 
on ||5 — /||^. Thus by Markov's inequality with £ even, 

P(||5 - /|| >t)= F{\\S - if > t^) < t~^ • E\\S - if < t-^ ■ Etr((5 - if). (1) 

■^Recall that the singular values of a (possibly rectangular) matrix B are the square roots of the eigenvalues of 
B* B, where (■)* denotes conjugate transpose. 
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Our proof works by expanding the expression tr((S' — lY) and computing its expectation. 
This expression is a sum of exponentially many monomials, each involving a product of i terms. 
Without delving into technical details at this point, each such monomial can be thought of as being 
in correspondence with some undirected multigraph (see the dot product multigraphs in the proof 
of Theorem [9]). We group monomials corresponding to the same graph, bound the contribution 
from each graph separately, then sum over all graphs. Multigraphs whose edges all have even 
multiplicity turn out to be easier to handle (Lemma llOp. However most graphs G do not have this 
property. Informally speaking, the contribution of a graph turns out to be related to the product 
over its edges of the contribution of that edge. Let us informally call this "contribution" F{G). 
Thus if C S is a subset of the edges of G, we can write F{G) < F{{G\E'f)/2 + F{{G\e\e')^)/2 
by AM-GM, where squaring a multigraph means duplicating every edge, and G\e' is G with all 
edges in E\E' removed. This reduces back to the case of even edge multiplicities, but unfortunately 
the bound we desire on F{G) depends exponentially on the number of connected components of G. 
Thus this step is bad, since if G is connected, then one of G\e' ,G\e'\e can have many connected 
components for any choice of E' . For example if G is a cycle on N vertices, for E' a single edge 
almost every vertex in Ge' is in its own connected component, and even if E' is every odd-indexed 
edge then the number of components blows up to N/2. Our method to overcome this is to show 
that any F{G) is bounded by some F{G') with the property that every connected component of 
G' has two edge-disjoint spanning trees. We then put one such spanning tree into E' for each 
component, so that G\e\e' Bind G\e' both have the same number of connected components as G. 

Our approach follows the classical moment method in random matrix theory; see |Taol2[ Section 
2] or |Verl2j introductions to this area. In particular, our approach is inspired by one taken by 
Bai and Yin |BY93] . who in our notation were concerned with the case n = d, U = I , H dense. 
Most of the complications in our proof arise because U is not the identity matrix, so that rows of 
U are not orthogonal. For example, in the case of U having orthogonal rows all graphs G in the 
last paragraph have no edges other than self-loops and are trivial to analyze. 

2 Analysis 

In this section let the orthonormal columns of C/ G M"^'^ be denoted n^, . . . Recall our goal is 
to show that all singular values of HU lie in the interval [1 — e, 1 + e] with probability 1 — 5 over 
the choice of 11 as long as s,m are sufficiently large. We assume 11 is an OSNAP with sparsity s. 
As in |BY93] we make use of Weyl's inequality (see a proof in |Taol2l Section 1.3]). 

Theorem 2 (Weyl's inequality). Let M, H,P benxn Hermitian matrices where M has eigenvalues 
Hi > ■ ■ ■ > jj-n, H has eigenvalues vi > ... > and P has eigenvalues pi > ■ ■ ■ > Pn- Then 
y 1 < i < n, it holds that + Pn^ pi ^ J^i + Pi- 
Let S = (nuyUU. Letting / be the d x d identity matrix, Weyl's inequality with M = S, 
H = {1 + s'^)I, and P = S — [1 + e^)! implies that all the eigenvalues of S lie in the range 
iP),l + e^ + X^ax{P)] ^ [l + e2-||P||,l + e2 + ||P||], where A^,„(M) (resp. XmaxiM)) 
is the smallest (resp. largest) eigenvalue of M. Since ||P|| < + \\S — I\\, it thus suffices to show 

¥{\\S - I\\ > 2e - e^) < 6, (2) 

since ||P|| < 2e implies that all eigenvalues of S lie in [(1 — e)^, (1 + e)^]. 
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Before proceeding with our proofs below, observe that for all k, k' 

Sk,k' = - ^ I ''^Sr,i<Jr,iUi 1 I ^ (5r,iCrr,iuf j 
^ r=l \i=l J \i=l / 

^ n / m \ 1 



S ^ — ' V ^ — ' / S 

i=l \r=l / r=l i^j 



k k' 



^ m 

r=l ijLj 

Noting {u'',u^) = Wu^W^ = 1 and {u'',u^') = for /c / k' , we have for all k,k' 

m 

{S - I)k,k' = X] X] Sr,A,j<yr,iCrr,jUiUj ■ (3) 
r=l i^j 

Theorem 3. For H an OSNAP with s = 1 and e G (0, 1), with probability at least 1 — 5 all singular 
values of HU are 1 ± e as long as m > 6~^{d'^ + d)/{2e — e^)^, (J is A-wise independent, and h is 
pairwise independent. 

Proof. We show Eq. ([2]). Our approach is to bound EUS" — /|p then use Markov's inequality. Since 

P(||S-/|| > 2e-e^) = mS-lf > i^e-e^f) < {2e-e'^)-^-E\\S-lf < {2e-e'^)-^-E\\S-lfp, (4) 

we can bound Ells' — /|||, to show Eq. ([2]). Here || • \\f denotes Frobenius norm. 

Now we bound K\\S — We first deal with the diagonal terms of 5 — /. By Eq. ([3]), 

ns-i)ik = EE^2i-'n-'f 

m 
2 

m' 

and thus the diagonal terms in total contribute at most 2d/m to EyS" — /|||n. 
We now focus on the off-diagonal terms. By Eq. ([3]), E(S' — 7)| ^, is equal to 



1 ™ 1 



Noting = {u^,v!'')'^ = I]fc=i(^ij^)^(^ij^T + Ei^j UiUi'ujUj' we have that Y^-_^j u^u'^'u'^uf < 0, so 

1 



m 



< -lln^lMln^'f 
m 
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_ 1 

m 

Thus summing over i ^ j, the total contribution from off-diagonal terms to — /||^ is at 
most d{d — 1) /m. Thus in total EyS" — I\\'^p < {cP + d) /m, and so Eq. @ and our setting of m gives 

F{\\S-I\\>2e-e')<—^-^-^<S. 

■ 

Before proving the next theorem, it is helpful to state a few facts that we will repeatedly use. 
Recall that denotes the ith column of U, and we will let Ui denote the ith row of U. 

Lemma 4. X]fc=i = ^■ 
Proof. 

(n \ / " \ " 

^Ukul J = e* I ^Ufc< J ej = ^{uk)i{uk)j = {u\u^), 
k=l / ij \k=l J k=l 

and this inner product is 1 for i = j and otherwise. ■ 
Lemma 5. For all i G [n], \\ui\\ < 1. 

Proof. We can extend U to some orthogonal matrix U' G M"^" by appending n — d columns. For 

the rows u[ of U' we then have \\ui\\ < \\u[\\ = 1. ■ 

Theorem 6 ( |NW6HrTut61j ). A multigraph G has k edge-disjoint spanning trees iff 

\Ep{G)\>k{\P\-l) 

for every partition P of the vertex set of G, where Ep{G) is the set of edges of G crossing between 
two different partitions in P. 

The following corollary is standard, and we will later only need it for the case k = 2. 

Corollary 7. Let G be a multigraph formed by removing at most k edges from a multigraph G' 
that has edge- connectivity at least 2k. Then G must have at least k edge-disjoint spanning trees. 

Proof. For any partition P of the vertex set, each partition must have at least 2k edges leaving it 
in G'. Thus the number of edges crossing partitions must be at least k\P\ in G', and thus at least 
k\P\ — k in G. Theorem [6] thus implies that G has k edge-disjoint spanning trees. ■ 

Fact 8. For any matrix B G C"'^'^, ||i3|| = supy^n x*By. 

Proof. We have sup||^||j|j^||=;^ x*By < \\B\\ since x*By < \\x\\ ■ \\B\\ ■ \\y\\. To show that unit norm 
x,y exist which achieve \\B\\, let B = UTiV* be the singular value decomposition of B. That is, 
U, V are unitary and S is diagonal with entries 0"i > (72 > . . . fJ^ > so that ||i?|| = ai. We can 
then achieve x*By = a\ by letting x be the first column of U and y be the first column of V . ■ 

Theorem 9. For 11 an OSNAP with s = Q{log^{d/6)/e) and e G (0,1), with probability at least 
1 — (5, all singular values of YiU are lie as long as m = Q{dlog^{d/S)/e^) anda,h are ^l{\og{d/6))- 
wise independent. 
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Proof. We will again show Eq. ([2]). Recall that by Eq. ([TJ we have 

mS - I\\>t) <t-^ -EtriiS - lY) (5) 

for £ any even integer. We thus proceed by bounding Etr((5 — lY) then applying Eq. ([5]). 
It is easy to verify by induction on i that for any B G M"^" and i > 1, 

iB%j= Yi n^*-*'=+i' ^"dthustr(i?^)= n^*-*'=+i- 

ti,...,t£^l£[n] k=l ti,...,ti^iG[n] k=l 

Applying this identity to B = S — I yields 

Etr((5 - lY) = ^ • E Y n ^n,nSrt,n^n,nar,j,v^^v^:^\ (6) 

ki,k2,...,ki^i t=l 
fci=fc£_|_i 

ri,...,ri 

The general strategy to bound the above summation is the following. Let ^ be the set of all 
monomials appearing on the right hand side of Eq. For ijj £ ^ define K{'i/j) = (ki, . . . , kg) as 
the ordered tuple of kt values in ip, and similarly define i-'(V') = {ih,ji), ■ ■ ■ , {ii,je)) and W{ip) = 
(ri, . . . ,r^). For each -0 G ^ we associate a three-layered undirected multigraph with labeled 
edges and unlabeled vertices. We call these three layers the left, middle, and right layers, and we 
refer to vertices in the left layer as left vertices, and similarly for vertices in the other layers. Define 
M(0) to be the set {ii, . . . ,ii,ji, . . . ,je} and define R{ip) = {ri, . . . ,rf}. We define y = \M{ip)\ 
and z = \R{'iIj)\. Note it can happen that y < 2£ if some it = if, jt = jt', or it = jf, and similarly 
we may also have z < i. The graph has x = i left vertices, y middle vertices corresponding to 
the distinct it,jt in i^, and z right vertices corresponding to the distinct r^. For the sake of brevity, 
often we refer to the vertex corresponding to it (resp. jt,rt) as simply it (resp. jt,rt). Thus note 
that when we refer to for example some vertex it, it may happen that some other if or jf is also 
the same vertex. We now describe the edges of G^,. For i{j = Yll=iSrt,it^rt,jt'^rt,it'^rt,jt''^ifU^l^^ we 
draw M labeled edges in G^ with distinct labels in [4^]. For each t G [i] we draw an edge from the 
tth left vertex to it with label 4(t — 1) + 1, from it to rj with label 4(t — 1) + 2, from rt to jt with 
label 4(t — 1) + 3, and from jt to the {t + l)st left vertex with label 4(t — 1) + 4. Observe that 
many different monomials will map to the same graph G^; in particular the graph maintains 
no information concerning equalities amongst the kt, and the y middle vertices may map to any y 
distinct values in [n] (and similarly the right vertices may map to any z distinct values in [m]). We 
handle the right hand side of Eq. ([6]) by grouping monomials ^ that map to the same graph, bound 
the total contribution of a given graph G in terms of its graph structure when summing over all 
with G^ = G, then sum the contributions from all such graphs G combined. 

Before continuing further we introduce some more notation then make a few observations. For 
a graph G as above, recall G has M edges, and we refer to the distinct edges (ignoring labels) as 
bonds. We let E{G) denote the edge multiset of a multigraph G and B{G) denote the bond set. 
We refer to the number of bonds a vertex is incident upon as its bond-degree, and the number of 
edges as its edge-degree. We do not count self-loops for calculating bond-degree, and we count them 
twice for edge-degree. We let LM{G) be the induced multigraph on the left and middle vertices of 
G, and MR{G) be the induced multigraph on the middle and right vertices. We let w = w{G) be 
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the number of connected components in MR(G). We let b = b{G) denote the number of bonds in 
MR{G) (note MR{G) has 2^ edges, but it may happen that b < 2i since G is a multigraph). Given 
G we define the undirected dot product multigraph G with vertex set M{ip). Note every left vertex 
of G has edge-degree 2. For each t S \P\ an edge («, j) is drawn in G between the two middle vertices 
that the tth left vertex is adjacent to (we draw a self-loop on i if i = j). We do not label the edges 
of G, but we label the vertices with distinct labels in \y\ in increasing order of when each vertex 
was first visited by the natural tour of G (by following edges in increasing label order). We name 
G the dot product multigraph since if some left vertex t has its two edges connecting to vertices 
i,j E [n], then summing over kt G [d\ produces the dot product {ui,Uj). 

Now we make some observations. Due to the random signs ar,i, a monomial has expectation 
zero unless every bond in MR{G) has even multiplicity, in which case the product of random signs 
in '(/' is 1. Also, note the expectation of the product of the 5r,i terms in ij) is at most {s/m)^ by 
OSNAP properties. Thus letting Q be the set of all such graphs G with even bond multiplicity in 
MR{G) that arise from some monomial ip appearing in Eq. ([6]), we have 



Etr((5 - If) < i . 



Gee 



G&g 



E 



n 



Jp:G^=Gt=l 



s \ b fm 
mJ V z 



E 

ip:G^=G ki 
RW=[z] 



E n 

ki t=l 



E 



m 



ai,...,ayG[n] eeE{G) 
Mi^j ai^aj e={i,j) 



(7) 



Before continuing further it will be convenient to introduce a notion we will use in our analysis 
called a generalized dot product multigraph. Such a graph G is just as in the case of a dot product 
multigraph, except that each edge e = {i,j) is associated with some matrix Mg. We call Me the 
edge-matrix of e. Also since G is undirected, we can think of an edge e = with edge-matrix 
Me also as an edge (j, i), in which case we say its associated edge- matrix is M*. We then associate 
with G the product 



eeG 



Note that a dot product multigraph is simply a generalized dot product multigraph in which Mg = I 
for all e. Also, in such a generalized dot product multigraph, we treat multiedges as representing 
the same bond iff the associated edge-matrices are also equal (in general multiedges may have 
different edge-matrices). 

Lemma 10. Let H be a connected generalized dot product multigraph on vertex set [N] with E[H) ^ 
and where every bond has even multiplicity. Also suppose that for all e G E{H), \\Me\\ < 1. Define 



a2=l 



=leG£(H) 
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where Va^ = for i ^ 1, and Va-^ equals some fixed vector c with \\c\\ < 1. Then f{H) < ||c|p. 

Proof. Let vr be some permutation of {2, . . . ,N}. For a bond q = G B(H), let 2ag denote 
the multiplicity of q in H. Then by ordering the assignments of the at in the summation 

a2,.--,a]vG[n] e£E{H) 
e={«j) 

according to vr, we obtain the exactly equal expression 

n n 

a,r(iV) = l q&B{H) a^{2) = l q&B{H) 

q={7T{N),j) q={^{l),j) 
Af<7r-l(j) 2<7r-l(j) 

Here we have taken the product over t < 7r~^(j) as opposed to t < 7r~^(j) since there may be self- 
loops. By Lemma[5]and the fact that ||c|| < 1 we have that for any i,j, {vi,Vj)'^ < • < 1, 
so we obtain an upper bound on Eq. ([8]) by replacing each (va^j^jj^aj)^"" term with {va^^^y Vaj)^ ■ 
We can thus obtain the sum 

n n 

E n i^a^iNyM.Va^f--- J] {Va^^^^,M,Vaf, (9) 

a^{iV)=l q&B{H) a^{2)=l q&B{H) 

q={n{N),j) q={^{2),j) 
q<^~Hj) 'i<^^Hj) 

which upper bounds Eq. ([8]). Now note for 2 < t < N that for any nonnegative integer /3t and for 
{q G B{H) : q = (7r(t),j),t < TT"^{j)} non-empty (note the strict inequality t < vr~^(j)), 



E II^mJI'''*- n i^a^i.y M.Vaf < E n i^-<tyM,Vaf (10) 
a7r{t)=l q&B{H) a,r(t)=l q&B{H) 

q=Mt),j) q=Mt)J) 



^ n E (-'^M.'^' 



qVaj) 



qeB(H) \a^(t)=l 
t<7r-l(i) 



q&B{H) \a^(t)=l 

g=('r{t)j) 

t<7r-l(j) 



JJ (MgVa, )* E ^^"i 



q&B{H) \i=l 

q=i^it),j) 

t<7r-l(j) 



n \\MqVa,f (11) 



q&B{H) 

q=i.At),3) 
t<7T-Hj) 
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< 



n 



(12) 



qeB(H) 
t<7r-l(j) 



where Eq. ([TO]) used Lemma [U Eq. ([TT]) used Lemma [31 and Eq. (fT2]) used that ||Mq|| < 1. Now 
consider processing the alternating sum-product in Eq. Q from right to left. We say that a bond 
G B{H) is assigned to i if ■K~^{i) < 7r~^[j). When arriving at the tth. sum-product and using 

||2 



the upper bound Eq. (|lip on the previous t — 1 sum-products, we will have a sum over 
raised to some nonnegative power (specifically the number of bonds incident upon 7r(t) but not 
assigned to vr(t), plus one if vr(t) has a self-loop) multiplied by a product of (fa^/^x , Vaj)'^ over all 
bonds (7r{t),j) assigned to 7r(t). There are two cases. In the first case 7r(t) has no bonds assigned 
to it. We will ignore this case since we will show that we can choose vr to avoid it. 

The other case is that 7r(t) has at least one bond assigned to it. In this case we are in the 



scenario of Eq. (jlip and thus summing over a^(^t) yields a non-empty product of 



for the j for 



which {7r{t),j) is a bond assigned to vr(t). Thus in our final sum, as long as we choose vr to avoid 
the first case, we are left with an upper bound of ||c|| raised to some power equal to the edge-degree 
of vertex 1 in H, which is at least 2. The lemma would then follow since ||c|p < ||c|p for j >2. 

It now remains to show that we can choose vr to avoid the first case where some t G {2, . . . , N} 
is such that 7r(t) has no bonds assigned to it. Let T be a spanning tree in H rooted at vertex 1. 
We then choose any tt with the property that for any i < j, 7r{i) is not an ancestor of 7r(j) in T. 
This can be achieved, for example, by assigning vr values in reverse breadth first search order. ■ 



Lemma 11. Let G be any dot product graph as in Eq. ([7]). Then 



ai,...,ayG[n] esG 
yi^j aiy^aj e={i,j} 



Proof. We first note that we have the inequality 



-w+1 



ai,...,ayeln] eeE{G) 
Wijtj ai^aj e={i,j) 



E 



ai,...,ay^ie[n] 



e=(«J) 



t=l ay-at e£E(G) 



J 



< 



ai,...,ay^ie[n\ ay=^ e(^E(G) 



ai,...,ay-iG[n] o.y-at ^^^(^Q-^ 
Vi7^ie[i/-1] ai^aj e={i,j) 



We can view the sum over t on the right hand side of the above as creating t — 1 new dot product 
multigraphs, each with one fewer vertex where we eliminated vertex y and associated it with vertex 
t for some t, and for each edge {y,a) we effectively replaced it with {t,a). Also in first sum where 
we sum over all n values of Oy, we have eliminated the constraints Oy ^ Oj for i ^ y. By recursively 
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applying this inequality to each of the resulting t summations, we bound 



ai,...,ayeln] e£E(G) 
aii^aj e={i,j) 

by a sum of contributions from yl dot product multigraphs where in none of these multigraphs 
do we have the constraint that Oj 7^ Uj for i ^ j- We will show that each one of these resulting 
multigraphs contributes at most d^"'^"'"^, from which the lemma follows. 

Let G' be one of the dot product multigraphs at a leaf of the above recursion so that we now 
wish to bound 



FiG' 



dof 



ai, 



-e&E{G') 



(13) 



where Mg = I for all e for G' . Before proceeding, we first claim that every connected component 
of G' is Eulerian. To see this, observe G has an Eulerian tour, by following the edges of G in 
increasing order of label, and thus all middle vertices have even edge-degree in G. However they 
also have even edge-degree in MR{G), and thus the edge-degree of a middle vertex in LM{G) must 
be even as well. Thus, every vertex in G has even edge-degree, and thus every vertex in each of 
the recursively created leaf graphs also has even edge-degree since at every step when we eliminate 
a vertex, some other vertex's degree increases by the eliminated vertex's degree which was even. 
Thus every connected component of G' is Eulerian as desired. 

We now upper bound F{G'). Let the connected components of G' be Ci, . . . ,Gcc{G')^ where 
CC(-) counts connected components. An observation we repeatedly use later is that for any gen- 
eralized dot product multigraph H with components Ci , . . . , Gcc{H) > 



CC{H) 

F{H)= n m 



(14) 



1=1 



We treat G' as a generalized dot product multigraph so that each edge e has an associated matrix 
Me (though in fact Mg = I for all e). Define an undirected multigraph to be good if all its connected 
components have two edge-disjoint spanning trees. We will show that F{G') < F(G") for some 
generalized dot product multigraph G" that is good then will show F{G") < (P""^^^. If G' itself is 
good then we can set G" = G' . Otherwise, we will show F{G') = F{Ho) = . . . = F{H-t) for smaller 
and smaller generalized dot product multigraphs Ht (i.e. with successively fewer vertices) whilst 
maintaining the invariant that each Ht has Eulerian connected components and has ||Me|| < 1 for 

all e. We stop when some is good and we can set G" = Hr- 

Let us now focus on constructing this sequence of Ht in the case that G' is not good. Let 
Hq = G' . Suppose we have constructed Hq, . . . , Ht-i for i > 1 none of which are good, and now we 
want to construct Ht- Since Ht-i is not good it cannot be 4-edge-connected by Corollary[7l so there 
is some connected component Gj* of Ht-i with some cut S C V{Gj*) with 2 edges crossing the cut 
{S,V{Cj*)\S) (note that since Gj* is Eulerian, any cut has an even number of edges crossing it). 
Choose such an 5 C V{Gj*) with |5| minimum amongst all such cuts. Let the two edges crossing 



13 



the cut be (g, h), (g' , h') with h,h' £ S (note that it may be the case that g = g' and/or h = h') 
Note that F{Cj*) equals the magnitude of 





( ] 




( 


( ] 




E 










°-h' 




e€E{V{Cj.)\S) 
\ E = {i,j) / 








1 



M(h',g') 



(15) 

We define Ht to be Ht-i but where in the j*th component we replace Cj* with C*(y(Cj')\5) 
and add an additional edge from g to g' which we assign edge-matrix M. We thus have that 
F{Ht~i) = F{Ht) by Eq. (jl4p . Furthermore each component of Ht is still Eulerian since every 
vertex in Ht-i has either been eliminated, or its edge-degree has been preserved and thus all 
edge-degrees are even. It remains to show that ||M|| < 1. 

We first claim that Cj*{S) has two edge-disjoint spanning trees. Define C to be the graph 
Cj*{S) with an edge from h to h' added. We show that C'{S) is 4-edge-connected so that Cj*{S) 
has two edge-disjoint spanning trees by Corollary [71 Now to see this, consider some S' C 5. 
Consider the cut {S' ,V(C')\S'). C is Eulerian, so the number of edges crossing this cut is either 
2 or at least 4. If it 2, then since \S'\ < \S\ this is a contradiction since S was chosen amongst such 
cuts to have \S\ minimum. Thus it is at least 4, and we claim that the number of edges crossing 
the cut {S',S\S') in C'{S) must also be at least 4. If not, then it is 2 since C'{S) is Eulerian. 
However since the number of edges leaving S' in C is at least 4, it must then be that h,h' S S' . 
But then the cut {S\S' ,V{C')\{S\S')) has 2 edges crossing it so that S\S' is a smaller cut than 
S with 2 edges leaving it in C", violating the minimality of 15*1, a contradiction. Thus C'{S) is 
4-edge-connected, implying Cj*{S) has two edge-disjoint spanning trees Ti,T2 as desired. 

Now to show ||M|| < 1, by Fact[8]we have ||M|| = sup^^^^^^^^^i^^^i x* Mx' . We have that 



c*Mx' = ^ (x, M(, 



X MX = > {X,M^g^h)Uah/ ■ 
asG[n]IS| 



/ \ 



e&E{Cj*{S)) 
\ e={i,j) 



E 



V 



eeTi 



\ 



J 



J 
( 



< 



( 



E 

as6[n]ISI 



eGE(Cj* (S))\Ti 
\ e=(i,j) 



( 



e6Ti 
e=(«j) 



+ E 



l\2 



\ 



eeE{C^,iS))\T^ 



<^(lkf + lk'f) 



(16) 
(17) 
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= 1, 

where Eq. (jl6p used the AM-GM inequahty, and Eq. (|17p used Lemma [TOl (note the graph with ver- 
tex set SU{g'} and edge set E{Cj*{S))\TiU{{g' , h')} is connected since T2 C E{Cj*{S))\Ti). Thus 
we have shown that Ht satisfies the desired properties. Now notice that the sequence Hq, . . . , Hi, . . . 
must eventually terminate since the number of vertices is strictly decreasing in this sequence and 
any Eulerian graph on 2 vertices is good. Therefore we have that H^^ is eventually good for some 
r > and we can set G" = Hr- 

It remains to show that for our final good G" we have F{G") < (P~'^~^^. We will show this in 
two parts by showing that both GG{G") < and F{G") < d'^'^C^"). For the first claim, note 

that CC{G") < CC{G) since every Ht has the same number of connected components as C, and 
CC{G') < CC{G). This latter inequality holds since in each level of recursion used to eventually 
obtain G' from G, we repeatedly identified two vertices as equal and merged them, which can only 
decrease the number of connected components. Now, all middle vertices in G lie in one connected 
component (since G is connected) and MR{G) has w connected components. Thus the at least 
vj — 1 edges connecting these components in G must come from LM{G), implying that LM[G) 
(and thus G) has at most y — w + 1 connected components, which thus must also be true for G" as 
argued above. 

It only remains to show F{G") < dP^'^^"\ Let G" have connected components Ci, . . . , Gcc{G") 
with each Gj having 2 edge-disjoint spanning trees Tl^T^. We then have 



CC{G") 

F{G")= n nct) 



t=l 

CC{G") 

n 

t=i 



CC{G") 
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t=i 



ai, 



■:«|V(Ct)|=l ee-B(Ct) 
e=(*j) 



ai, 
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■.«|y{Ct)| 



\e={«,i) 



CC{G") 

s n 

t=i 
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^ n ^^"^-'^^ 

ai=la2,...,a\v{Ct)\='^ eSTf 
e=(i,i) 



eeE(Ct)\n 
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ai=l a2, 
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n 



CC(G") n 

£ n E 

t=l ai=l 
CC{G") 



^ai I 



a\v(Ct)\=^eeE{Ct)\Tl 

(18) 



(19) 
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(fC{G") 
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where Eq. (fT8|) used the AM-GM inequahty, and Eq. (fT9]) used Lemma [TOl which apphes since 
V{Ct) with edge set is connected, and V{Ct) with edge set E{Ct)\Tl is connected (since T| C 

Now, for any G G ^ we have y + z < b + w since for any graph the number of edges plus the 
number of connected components is at least the number of vertices. We also have b > 2z since 
every right vertex of G is incident upon at least two distinct bonds (since it 7^ jt for all t). We 
also have y < b < £ since MR{G) has exactly 2£ edges with no isolated vertices, and every bond 
has even multiplicity. Finally, a crude bound on the number of different G & Q with a given 6, y, z 
is {zy'^Y < (b^y. This is because when drawing the graph edges in increasing order of edge label, 
when at a left vertex, we draw edges from the left to the middle, then to the right, then to the 
middle, and then back to the left again, giving y'^z choices. This is done i times. Thus by LemmafTT] 
and Eq. ([71), and using that tl < eyft{t/ef for ah t > 1, 

Etr((5 - if) <d--^ Yl Yl y--^^- "'"'^ • ^^"'^ 

b,y,z,w G&Q 

b{G)=b,y{G)=y 
w{G)=w,z{G)=z 

b,y,z,w GaQ 

b{G)=b,y{G)=y 
w{G)=w,z{G)=z 

b,y,z,w 

b,y,z,w \ / 

<««'v/*.max(^)'"((6Ve)\/|] (20) 

Define e = 2e - e^. For £ > ln{ed£^/^/6) = 0{ln{d/6)), s > e£^/e = 0{log{d/6y /e), and 
m > d£^ /e^ = 0{d\og{d/ 5)^ / e^), the above expression is at most 5e^ . Thus as in Eq. ([2]), by Eq. ([5]) 
we have 

P (||5 - /|| > e) < ^ • Etr((5 - if) < 6. 

■ 

The proof of Theorem [9] reveals that for b = l/poly((i) one could also set m = 0{d'^^^ /e^) and 
s = 0^(l/e) for any fixed constant 7 > and arrive at the same conclusion. Indeed, let 7' < 7 be 
any positive constant. Let £ in the proof of Theorem[9]be taken as 0(log(d/(^)) = 0(logd). It suffices 
to ensure max2<b<£(6Vs)^"'' • {{b^^j e)^fdlmf < e^6/{ed£^/'^) by Eq. ([20]). Note d^' > b^^ as long as 
6/ln6 > 37-1^/ In d = 0(1/7'), so d^i' > b^^ for b > b* for some b* = 6(7-7 log(l/7)). We choose 
s > e(6*)^/e and ?n = d^~^^ /e^ , which is at least d^'^"' £^ /e^ioY d larger than some fixed constant. 
Thus the max above is always as small as desired, which can be seen by looking at 6 < 6* and 
b > b* separately (in the former case b^/s < 1/e, and in the latter case {b^ /sY~^ ■ {{b^ / e) \/ d / mf' < 
{e/eYb^^d~'^ ^ = {e/eYe^^^'^^~'' ''^'^'^ < (e/e)^ is as small as desired). This observation yields: 
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Theorem 12. Let a, 7 > be arbitrary constants. For H an OSNAP with s = G(l/e) and 
£ S (0, 1), with probability at least 1 — l/d", all singular values ofHU are lie for m = fi(d^"'"'^/e^) 
and a,h being Q{logd)-wise independent. The constants in the big-Q and big-Vt depend on 0,7. 

Remark 13. Section [1] stated the time to list all non-zeroes in a column in Theorem [9] is tc = 
0{s). For 5 = l/poly(ci), naively one would actually achieve t^ = 0{s ■ logd) since one needs 
to evaluate an 0(logd)-wise independent hash function s times. This can be improved to 0{s) 
using fast multipoint evaluation of hash functions; see for example the last paragraph of Remark 
16 of jKNPWll] . 

3 Applications 

We use the fact that many matrix problems have the same time complexity as matrix multipli- 
cation including computing the matrix inverse |BH74j |Har081 Appendix A], and QR decompo- 
sition |Sch73] . In this paper we only consider the real RAM model and state the running time 
in terms of the number of field operations. The algorithms for solving linear systems, computing 
inverse, QR decomposition, and approximating SVD based on fast matrix multiplication can be 
implemented with precision comparable to that of conventional algorithms to achieve the same 
error bound (with a suitable notion of approximation/stability). We refer readers to |DDH07j for 
details. Notice that it is possible that both algorithms based on fast matrix multiplication and 
conventional counterparts are unstable, see e.g. |AV97j for an example of a pathological matrix 
with very high condition number. 

In this section we describe some applications of our subspace embeddings to problems in nu- 
merical linear algebra. All applications follow from a straightforward replacement of previously 
used embeddings with our new ones as most proofs go through verbatim. In the statement of 
our bounds we implicitly assume nnz(j4) > n, since otherwise fully zero rows of A can be ignored 
without affecting the problem solution. 

3.1 Approximate Leverage Scores 

This section describes the application of our subspace embedding from Theorem [9] or Theorem [12] 
to approximating the leverage scores. Consider a matrix A of size nxd and rank r. Let U he anxr 
matrix whose columns form an orthonormal basis of the column space of A. The leverage scores of 
A are the squared lengths of the rows of U. The algorithm for approximating the leverage scores 
and the analysis are the same as those of |CW12j . which itself uses essentially the same algorithm 
outline as Algorithm 1 of |DMIMW12| . The improved bound is stated below (cf. |CW12t Theorem 
21]). 

Theorem 14. For any constant e > 0, there is an algorithm that with probability at least 2/3, 
approximates all leverage scores of a n x d matrix A in time 0(nnz(A)/e^ + r^e"^"^). 

Proof. As in [CW12] . this follows by replacing the Fast Johnson-Lindenstrauss embedding used 
in |DMIMW12j with our sparse subspace embeddings. The only difference is in the parameters 
of our OSNAPs. We essentially repeat the argument verbatim just to illustrate where our new 
OSE parameters fit in; nothing in this proof is new. Now, we first use |yCKL12] so that we can 
assume A has only r = rank(74) columns and is of full column rank. Then, we take an OSNAP 
n with m = 0{r/e^),s = (polylogr)/e and compute 11^. We then find so that IIAR~^ has 
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orthonormal columns. The analysis of |DMIMW12j shows that the £2 of the rows of AR"^ are lie 
times the leverage scores of A. Take 11' £ M'"^* to be a JL matrix that preserves the £2 norms of 
the n rows of AR~^ up to 1 it e. Finally, compute i?~^n' then A{R~^Il') and output the squared 
row norms of ARU'. 

Now we bound the running time. The time to reduce A to having r linearly independent columns 
is 0((nnz(^) + r'^)logn). 11^ can be computed in time 0(nnz(^) • (polylogr)/e). Computing 
R £ W^''' from the QR decomposition takes time 0{m^) = 0{r'^ / e^^), and then R can be inverted 
in time 0{r^)] note IIAR~^ has orthonormal columns. Computing R~^I\' column by column 
takes time O(r^logr) using the FJLT of [ALlUlKWllj with t = 0(e~^ logn(loglogn)^). We then 
multiply the matrix A by the r x t matrix i2~^n', which takes time 0{t • nnz(^)) = 0{rmz{A) / e^) . 



3.2 Least Squares Regression 

In this section, we describe the application of our subspace embeddings to the problem of least 
squares regression. Here given a matrix A of size n x d and a vector b € M", the objective is 
to find X G M'^ minimizing \\Ax — b\\2. The reduction to subspace embedding is similar to those 
of |CW12irSar06j . The proof is included for completeness. 

Theorem 15. There is an algorithm for least squares regression running in time 0{nnz{A) + 
log((i/e)/e^) and succeeding with probability at least 2/3. 

Proof. Applying Theorem [3] to the subspace spanned by columns of A and b, we get a distribution 
over matrices 11 of size 0{d? /e^) x n such that 11 preserves lengths of vectors in the subspace up 
to a factor lie with probability at least 5/6. Thus, we only need to find argmin^ UlIAx — n6||2. 
Note that YIA has size 0{d^/e^) X d. By Theorem 12 of |Sar06], there is an algorithm that with 
probability at least 5/6, finds a 1 it e approximate solution for least squares regression for the 
smaller input of HA and 116 and runs in time 0{d^ \og{d / e) / e^) . ■ 

The following theorem follows from using the embedding of Theorem [9] and the same argument 
as |CW12[ Theorem 32]. 

Theorem 16. Let r be the rank of A. There is an algorithm for least squares regression running in 
time 0(nnz(A)((log r)'-^(^) +log(n/e)) + r'^(log r)'-^^^) + r^log(l/e)) and succeeding with probability 
at least 2/3. 

3.3 Low Rank Approximation 

In this section, we describe the application of our subspace embeddings to low rank approximation. 
Here given a matrix A, one wants to find a rank k matrix A^ minimizing — ^^11^7. Let be the 
minimum ||^ — AfcHj? over all rank k matrices A^. Notice that our matrices are of the same form as 
sparse JL matrices considered by |KN12] so the following property holds for matrices constructed 
in Theorem El (cf. |CW12l Lemma 24]). 

Theorem 17. \KN1S\ Theorem 19] Fix e,(5 > 0. Let T) be the distribution over matrices given in 
Theorem with n columns. For any matrices A, B with n rows, 

¥sM\\A^S'^SB - A^B\\f > ?>e/2\\A\\F\\B\\F] < 5 
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The matrices of Theorem[3]are the same as those of |CW12) so the above property holds for them 
as well. Therefore, the same algorithm and analysis as in |CW12| work. We state the improved 
bounds using the embedding of Theorem [3] and Theorem [9] below (cf. jCW121 Theorem 36 and 38]). 

Theorem 18. Given a matrix A of size n x n, there are 2 algorithms that, with probability at 
least 3/5, find 3 matrices U, S, V where U is of size n x k, T, is of size k x k, V is of size n x k, 
U^U = V'^V = Ik, ^ is a diagonal matrix, and 

\\A-U^V*\\f < (1 + e)Afc 

The first algorithm runs in time 0{nnz{A))+0{nk'^ +nk^~^ e~^"^ +k^ e~'^~'^) . The second algorithm 
runs in time 0(nnz(^) log'^^^^ k) + d{nk'^~^e~^~'^ + k'^e~'^~'^). 

Proof. The proof is essentially the same as that of |CW12] so we only mention the difference. We 
use 2 bounds for the running time: multiplying an a x 6 matrix and a 5 x c matrix with c > a takes 
0{a^~'^bc) time (simply dividing the matrices into a x a blocks), and approximating SVD for an 
a X b matrix M with a > b takes 0{ab^~^) time (time to compute M'^M, approximate SVD of 
M^M = QDQ'^ in 0{b'^) time |DDH07j . and compute MQ to complete the SVD of M). ■ 
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